---
id: refactoring
title: Refactoring
---

It may seem that the data is extracted and the crawler is done, but honestly, this is just the beginning. For the sake of brevity, we've completely omitted error handling, proxies, logging, architecture, tests, documentation and other stuff that a reliable software should have. The good thing is, error handling is mostly done by Crawlee itself, so no worries on that front, unless you need some custom magic.

:::info Navigating automatic bot-protextion avoidance

You might be wondering about the **anti-blocking, bot-protection avoiding stealthy features** and why we haven't highlighted them yet. The reason is straightforward: these features are **automatically used** within the default configuration, providing a smooth start without manual adjustments.

:::

{/* TODO: add this to the info once the relevant guide is ready

However, the default configuration, while powerful, may not cover every scenario.

If you want to learn more, browse the [Avoid getting blocked](../guides/avoid-blocking), [Proxy management](../guides/proxy-management) and [Session management](../guides/session-management) guides.
*/}

To promote good coding practices, let's look at how you can use a `Router` class to better structure your crawler code.

## Request routing

In the following code, we've made several changes:

- Split the code into multiple files.
- Added custom instance of `Router` to make our routing cleaner, without if clauses.
- Moved route definitions to a separate `routes.py` file.
- Simplified the `main.py` file to focus on the general structure of the crawler.

### Routes file

First, let's define our routes in a separate file:

```python title="src/routes.py"
from crawlee.basic_crawler import Router
from crawlee.playwright_crawler import PlaywrightCrawlingContext

router = Router[PlaywrightCrawlingContext]()


@router.default_handler
async def default_handler(context: PlaywrightCrawlingContext) -> None:
    # This is a fallback route which will handle the start URL.
    context.log.info(f'default_handler is processing {context.request.url}')

    await context.page.wait_for_selector('.collection-block-item')

    await context.enqueue_links(
        selector='.collection-block-item',
        label='CATEGORY',
    )


@router.handler('CATEGORY')
async def category_handler(context: PlaywrightCrawlingContext) -> None:
    # This replaces the context.request.label == CATEGORY branch of the if clause.
    context.log.info(f'category_handler is processing {context.request.url}')

    await context.page.wait_for_selector('.product-item > a')

    await context.enqueue_links(
        selector='.product-item > a',
        label='DETAIL',
    )

    next_button = await context.page.query_selector('a.pagination__next')

    if next_button:
        await context.enqueue_links(
            selector='a.pagination__next',
            label='CATEGORY',
        )


@router.handler('DETAIL')
async def detail_handler(context: PlaywrightCrawlingContext) -> None:
    # This replaces the context.request.label == DETAIL branch of the if clause.
    context.log.info(f'detail_handler is processing {context.request.url}')

    url_part = context.request.url.split('/').pop()
    manufacturer = url_part.split('-')[0]

    title = await context.page.locator('.product-meta h1').text_content()

    sku = await context.page.locator('span.product-meta__sku-number').text_content()

    price_element = context.page.locator('span.price', has_text='$').first
    current_price_string = await price_element.text_content() or ''
    raw_price = current_price_string.split('$')[1]
    price = float(raw_price.replace(',', ''))

    in_stock_element = context.page.locator(
        selector='span.product-form__inventory',
        has_text='In stock',
    ).first
    in_stock = await in_stock_element.count() > 0

    data = {
        'manufacturer': manufacturer,
        'title': title,
        'sku': sku,
        'price': price,
        'in_stock': in_stock,
    }

    await context.push_data(data)
```

### Main file

Next, our main file becomes much simpler and cleaner:

```python title="src/main.py"
import asyncio

from crawlee.playwright_crawler import PlaywrightCrawler

from .routes import router


async def main() -> None:
    crawler = PlaywrightCrawler(
        # Let's limit our crawls to make our tests shorter and safer.
        max_requests_per_crawl=50,
        # Provide our router instance to the crawler.
        request_handler=router,
    )

    await crawler.run(['https://warehouse-theme-metal.myshopify.com/collections'])


if __name__ == '__main__':
    asyncio.run(main())
```

By structuring your code this way, you achieve better separation of concerns, making the code easier to read, manage and extend. The `Router` class keeps your routing logic clean and modular, replacing if clauses with function decorators.

## Summary

Refactoring your crawler code with these practices enhances readability, maintainability, and scalability.

### Splitting your code into multiple files

There's no reason not to split your code into multiple files and keep your logic separate. Less code in a single file means less complexity to handle at any time, which improves overall readability and maintainability. Consider further splitting the routes into separate files for even better organization.

### Using a router to structure your crawling

Initially, using a simple `if` / `else` statement for selecting different logic based on the crawled pages might appear more readable. However, this approach can become cumbersome with more than two types of pages, especially when the logic for each page extends over dozens or even hundreds of lines of code.

It's good practice in any programming language to split your logic into bite-sized chunks that are easy to read and reason about. Scrolling through a thousand line long `request_handler()` where everything interacts with everything and variables can be used everywhere is not a beautiful thing to do and a pain to debug. That's why we prefer the separation of routes into their own files.

{/* TODO: write this once SDK v2 is ready

## Next steps

In the next and final step, you'll see how to deploy your Crawlee project to the cloud. If you used the CLI to bootstrap your project, you already have a **Dockerfile** ready, and the next section will show you how to deploy it to the [Apify Platform](../deployment/apify-platform) with ease.

*/}
